{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Kylinpy 简介\n",
    "\n",
    "[Kylinpy](https://github.com/Kyligence/kylinpy) 初衷是作为 SQLAlchemy Dialect 而开发设计的. 随着 KI 的使用, 逐渐演化成 Apache Kylin / KE3 / KE4 `Datasource` 服务层. 随着 Airflow 支持, 又加入了统一的 Job 服务\n",
    "\n",
    "## 特性\n",
    "- 提供 SQLAlchemy Dialect\n",
    "- 提供 Pandas read_sql 接口\n",
    "- 针对 Apache Kylin / Kyligence Enterprise 3 暴露 Cube 作为 Datasource 服务层\n",
    "- 针对 Kyligence Enterprise 4 暴露 Model 作为 Datasource 服务层\n",
    "- 尽力做到不同版本 Kylin / KE 使用同样 API 访问\n",
    "- Python 2.7 / Python3 兼容\n",
    "- 现在版本依然可以做到无额外依赖, 纯 Python 实现, 安装部署很方便\n",
    "- 高测试覆盖率(覆盖率保持 80% 以上)\n",
    "\n",
    "## Online 安装\n",
    "```\n",
    "pip install kylinpy\n",
    "```\n",
    "\n",
    "## Offline 安装\n",
    "\n",
    "```\n",
    "# download from https://pypi.org/project/kylinpy/#files\n",
    "pip install kylinpy-2.8.1.tar.gz\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 作为 SQLAlchemy Dialect"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import create_engine, inspect\n",
    "\n",
    "\"\"\"\n",
    "在 DSN 中可以很方便的指定 Kylin / KE 版本\n",
    "\n",
    "Kylin Instance\n",
    "engine = create_engine(\"kylin://admin:KYLIN@10.1.2.31:7070/learn_kylin\")\n",
    "\n",
    "KE4 Instance\n",
    "engine = create_engine(\"kylin://admin:KYLIN@10.1.2.31:7070/learn_kylin?version=v4\")\n",
    "\"\"\"\n",
    "\n",
    "engine = create_engine(\"kylin://admin:KYLIN@10.1.2.31:7070/learn_kylin?version=v2\")\n",
    "sql = 'select * from kylin_sales limit 10'\n",
    "results = engine.execute(sql)\n",
    "\n",
    "#results.fetchall()\n",
    "#inspector = inspect(engine)\n",
    "#inspector.get_table_names('')\n",
    "#inspector.get_schema_names()\n",
    "#inspector.get_columns('DEFAULT.KYLIN_SALES')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 更常用的做法是把返回结果集定义成 Pandas Dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   kylin_sales.trans_id kylin_sales.part_dt kylin_sales.lstg_format_name  \\\n",
      "0                     0          2012-12-14                       Others   \n",
      "1                     1          2012-08-28                       Others   \n",
      "2                     2          2012-02-16                         ABIN   \n",
      "3                     3          2013-10-19                   FP-non GTC   \n",
      "4                     4          2012-10-22                       Others   \n",
      "5                     5          2013-01-25                       FP-GTC   \n",
      "6                     6          2013-04-04                       Others   \n",
      "7                     7          2012-04-11                       Others   \n",
      "8                     8          2013-01-19                         ABIN   \n",
      "9                     9          2012-04-30                   FP-non GTC   \n",
      "\n",
      "   kylin_sales.leaf_categ_id  kylin_sales.lstg_site_id  \\\n",
      "0                      88750                         0   \n",
      "1                     175750                         0   \n",
      "2                     148324                        15   \n",
      "3                      37831                         0   \n",
      "4                     140746                       100   \n",
      "5                      16509                         0   \n",
      "6                        963                         0   \n",
      "7                      15687                         0   \n",
      "8                      60606                         3   \n",
      "9                     106246                         0   \n",
      "\n",
      "   kylin_sales.slr_segment_cd  kylin_sales.price  kylin_sales.item_count  \\\n",
      "0                          11            36.2828                       4   \n",
      "1                          13            23.8563                      20   \n",
      "2                          13            88.3418                      18   \n",
      "3                          13            47.3015                       3   \n",
      "4                          11            83.4540                      16   \n",
      "5                         -99            22.9896                      11   \n",
      "6                          13            88.5907                      11   \n",
      "7                          15            88.1940                      19   \n",
      "8                          13            77.9727                      13   \n",
      "9                          14            52.2950                      17   \n",
      "\n",
      "   kylin_sales.seller_id  kylin_sales.buyer_id kylin_sales.ops_user_id  \\\n",
      "0               10000349              10002313                 ANALYST   \n",
      "1               10000927              10004376                 ANALYST   \n",
      "2               10000005              10006710                   ADMIN   \n",
      "3               10000209              10003717                 ANALYST   \n",
      "4               10000154              10006076                   ADMIN   \n",
      "5               10000372              10007436                   ADMIN   \n",
      "6               10000648              10009869                 MODELER   \n",
      "7               10000866              10000400                   ADMIN   \n",
      "8               10000936              10000687                 MODELER   \n",
      "9               10000047              10009223                 MODELER   \n",
      "\n",
      "  kylin_sales.ops_region  \n",
      "0                Beijing  \n",
      "1                Beijing  \n",
      "2               Shanghai  \n",
      "3                Beijing  \n",
      "4               Shanghai  \n",
      "5               Shanghai  \n",
      "6               Hongkong  \n",
      "7               Shanghai  \n",
      "8               Hongkong  \n",
      "9               Hongkong  \n",
      "kylin_sales.trans_id              int64\n",
      "kylin_sales.part_dt              object\n",
      "kylin_sales.lstg_format_name     object\n",
      "kylin_sales.leaf_categ_id         int64\n",
      "kylin_sales.lstg_site_id          int64\n",
      "kylin_sales.slr_segment_cd        int64\n",
      "kylin_sales.price               float64\n",
      "kylin_sales.item_count            int64\n",
      "kylin_sales.seller_id             int64\n",
      "kylin_sales.buyer_id              int64\n",
      "kylin_sales.ops_user_id          object\n",
      "kylin_sales.ops_region           object\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_sql(sql, engine)\n",
    "print(df)\n",
    "print(df.dtypes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Apache Kylin / KE3 / KE4 抽象 `Datasource` 服务层\n",
    "\n",
    "- 针对 Apache Kylin / Kyligence Enterprise 3 暴露 Cube 作为 Datasource 服务层\n",
    "- 针对 Kyligence Enterprise 4 暴露 Model 作为 Datasource 服务层"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from kylinpy import Kylin\n",
    "\n",
    "# 1. 实例化 Kylin 对象, 依然可以很方便的通过切换构造函数的 verserion 参数, 而指定不同版本的 Kylin 对象\n",
    "kylin = Kylin(\n",
    "    host='<please input KE4 hostname without scheme>',\n",
    "    port=7070,\n",
    "    username='ADMIN',\n",
    "    password='KYLIN',\n",
    "    project='demo',\n",
    "    version='v4',\n",
    ")\n",
    "# 2. 取得一个 Datasource, 这里的 Datasource, 在 KE3 中是Cube, 在 KE4 中是以个 Modle, 在 KI 中又是一个自定义格式的dataset.灵活的设计, 封装了数据源的差异\n",
    "ds = kylin.get_datasource('test')\n",
    "\n",
    "\n",
    "#print(\"datasource name: \", ds.name)\n",
    "# datasource 最重要的属性是 dimensions 和 measures\n",
    "#print(\"dimension objects: \", ds.dimensions)\n",
    "\n",
    "# 每一个 dimension 都有一些具体的属性\n",
    "# customer_c_city = ds.dimensions[0]\n",
    "# print(\"dimension name: \", customer_c_city.name)\n",
    "# print(\"dimension type: \", customer_c_city.datatype)\n",
    "# print(\"dimension verbose: \", customer_c_city.verbose)\n",
    "\n",
    "\n",
    "# 还可以很方便查看 dimension 所关联的源表信息\n",
    "# print(\"source table schema of DIMENSION: \", customer_c_city.table.scheme)\n",
    "# print(\"source table name of DIMENSION: \", customer_c_city.table.name)\n",
    "\n",
    "# measure 信息也是一样的\n",
    "print(\"measure objects: \", ds.measures)\n",
    "measure1 = ds.measures[0]\n",
    "print(\"measure name: \", measure1.name)\n",
    "print(\"measure verbose: \", measure1.verbose)\n",
    "print(\"measure expression: \", measure1.expression)\n",
    "\n",
    "str(ds.from_clause)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. 以 Datasource 为源点, 可以有很多实用方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 列出这个 model 的聚合组. 很遗憾, 这不是一个在手册中出现的API, 所以会在打印出 Warning message\n",
    "ds.list_index_rules()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 清除聚合组\n",
    "ds.clear_up_index_rules()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 很友好的方式增加聚合组\n",
    "ds.add_index_rule(\n",
    "    load_data=False,  # whether or not auto build index\n",
    "    include=['CUSTOMER.C_CITY'],  # dimenstion name in list\n",
    "    mandatory=['CUSTOMER.C_CITY'],  # dimension name in list\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 列出所有 index\n",
    "ds.list_indexes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 还有一些实用方法等待您去发现\n",
    "[m for m in dir(ds) if not m.startswith('_')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Kylinpy 还在持续迭代中, 很多开源用户想要使用 Python 连接 Apache Kylin 都在使用这个项目.\n",
    "- 稳定的 Superset 客户端\n",
    "- 稳定的 Pandas 客户端\n",
    "- Airflow 客户端"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
